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^ SeTJo" tThZ«^i""ul"n n " l<,9, th ° r ' 0f m * th ° d ° f prep " r " t,0n by microbial P°'yP«P«d« expression and conversion 
© 

Microbial expression of a chimeric gene is used to 
produce a polypeptide comprising the amino acid sequence 
of human proinsulin, or an analog thereof differing in the "C" 
chain portion. A polypeptide so produced contains a se- 
quence of additional amino acid units sufficient in number to 
protect it from bacterial proteases, and has a cleavage site 
e.g. 8 methionine residue adjacent the sequence of amino 
acid units corresponding to the proinsulin or proinsulin 
analoQ. Cleavage at this site (e.g. by CNBr) generates 
^ proinsulin (or the analog) which is treated in vitro to form the 
disulfide bonds between the "A" and "B" chain proteins 
characteristic of human insulin. The "C" chain portion is then 
~ excised enzymatically to yield human insulin useful e.g. in 
the treatment of diabetes. 

The chimeric gene may be synthesised from oligonu- 
' deotides and inserted into a plasmid which is used to 
q transform a host cell, e.g. £ colL 
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HUMAN PROINSULIN AND ANALOGS THEREOF 
AND METHOO OF PREPARATION BY MICROBIAL 
POLYPEPTIDE EXPRESSION AND 
CONVERSION THEREOF TO HUMAN INSULIN 

Rel ated Appl ications 

This application is related to and incorporates by 
reference the disclosures of European Patent Application 
Publications Nos. 0001930 (A.N. 78300597*8) and 0036776 
(A.N. 81301227.5). 

Field of the Invention 

This invention relates to microbial expression of 
polypeptides. In one aspect, it relates to the preparation of 
genes for the microbially expressible production of 
intermediates useful in the preparation of human insulin. In 
another aspect. It relates to the preparation of human 
proinsulin or analogs thereof differing from human proinsuHn 
in the."C" chain portion. In yet another aspect, it relates to 
the preparation of human insulin from the prepared human 
proinsulin or an analog thereof. 
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Background of the Invention 

Diabetes, the human condition characterized by a failure of 
the pancreas to generate the polypeptide hormone Insulin In 
sufficient quantities. In severe cases at least, Is currently 
treated by injection of Insulin derived from the pancreas of 
slaughtered animals. Bovine and porcine insulin. In 
particular, are used for this purpose. 

The use of Insulin derived from animals is unsatisfactory 
from at least two standpoints. J n the first place, the 
extraction of insulin from the pancreas of slaughtered animals 
is a complex process that requires large quantities of the 
organs. Secondly, and more importantly from the diabetic's 
point of view, the Insulin derived from animal sources is not 
chemically identical to human insulin, differing in the 
sequence of peptide units. Furthermore, it sometimes contains 
non-homologous animal hormones, such as the corresponding 
proinsulin. albeit In small quantities. As a result, the 
response of patients treated Kith animal derived insulin is not 
•s satisfactory as desired. For example, an Immune response to 
animal insulin is believed to be a source of chronic 
complications in certain treatments of diabetes. 

Accordingly, there has gone unfilled a long felt need to 
have a source of insulin identical chemically to human insulin, 
uncontaminated by other biologically active impurities, fn 
amounts sufficient to permit diabetics to be treated 
economically. Complicating this task is the complex chemical 
structure of human Insulin. Structurally it has two 
polypeptide chains referred to as the »A" and "B" chains bound 
to each other by disulfide bonds. The A chain. SO me 21 amino 
acid units in length, is bound (crossl inked) to the B chain, a 
chain of 30 amino acids, through di sul fi de bonds between units 
of the amino acid cysteine in each chain. 
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To be maximally effective 1n humans, the amino acid units 
of insulin must be precisely ordered to correspond to that 
produced in vivo. However, the complexity of the molecule 1s 
such that conventional methods of chemical syntnesls are 
unsuited to i ts >reparati on . on a commercial scale at least. 

Insulin is produced in vivo in the pancreas in the form of 
preproinsulln. P reprol nsul i n is a polypeptide comprising the 
Zi units of the A chain, the 30 units of the B chain, a 
bridging or connecting chain of 35 units referred to as the C 
chain and a 24 amino acid - presequence- (Met Ala Leu Trp Met 
Arg Leu Leu Pro Leu Leu Ala Leu Leu Ala Leu Trp Gly Pro Asp Pro 
Ala Ala Ala) attached to the N-terminal phenylalanine amino 
acid beginning the B chain. Proinsulin, lacking the 
presequence is shown in Figure 1 with a methionine amino acid 
in place of the presequence. This presequence may participate 
in secretion from the cells In which it is produced. As the . 
preproinsulln is excreted from the islet cells on the pancreas, 
the presequence is excised to leave the proinsulin chain. This 
chain folds to a structure in which three disulfide bonds are 
formed, two of which are between the A and B chain segments of 
the proinsulin. The connecting C chain is then excised 
proteolytically to leave a residue which is Insulin, consisting 
of the A and B chains bound together by the disulfide bonds. 

This application describes a method for obtaining human 
Insulin and human proinsulin and analogs thereof which differ 
from human proinsulin in the sequence of amino acids making up 
the C chain. The method utilizes the burgeoning recombinant 
ONA technology. The following discussion of elements of the 
technology provide background to the detailed description of 
the invention. 

With the advent of recombinant DNA technology, the 
controlled bacterial production of useful polypeptides has 
become possible. Already in hand are bacteria modified by this 
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technology to permit the production of such polypeptide 
products as somatostatin (K. I takura et aK , Science L98, 1056 
(1977), the (component) A and B chains of human insulin (D.V. 
Goeddel et jH. , Proc. Nat'1. Acad. Sci. USA 76, 106 (1979)) and 
human growth hormone (O.V. Goeddel et aK , Nature 28J,, 544 
(1979)). Such is the power of the technology that virtually 
any useful polypeptide may be bacterially produced, putting 
within reach the controlled manufacture of hormones, enzymes, 
antibodies, and vaccines useful against a wide variety of 
diseases. The cited materials, which describe in greater 
detail the representative examples referred to above, are 
incorporated herein by reference, as are other publications 
referred to infra , to illuminate the background of the 
invention. % 

The work horse of recombinant DNA technology is the 
plasmid, an extra-chromosomal loop of doub 1 e-s tranded DNA found 
in bacteria, oftentimes in multiple copies per bacterial cell. 
Included in the information encoded in the plasmid DNA is that 
required to reproduce the plasmid in daughter cells (i.e., a 
"replicon") and ordinarily, one or more selection 
characteristics, such as resistance to antibiotics, which 
permit clones of the host cell containing the plasmid of 
interest to be recognized and preferentially grown in selective 
media. The utility of plasmids, which can be recovered and 
isolated from the host microorganism, lies in the fact that 
they can be specifically cleaved by one or another restriction 
endonuclease or -restriction enzyme", each of which recognizes 
a different site on the plasmidic DNA. Thereafter heterologou 
genes or gene fragments may be inserted into the plasmid by 
endwise joining at the cleavage site or at reconstructed ends 
adjacent the cleavage site. 



0055945 

-5- 



A$ used herein, the term -heterologous" refers .to a. gene 
not ordinarily found In. or a polypeptide sequence ordinarily 
not produced by, the host microorgaflisra whereas the term, 
-homologous" refers to a gene or polypeptide which is produced 
in the host microorganism, such as^£.. coU. OKA recombination 
is performed outside the microorganisms but the resulting 
-recombinant" plasmid can be introduced into .microorganisms by 
a process known as transformation and large quantities of the 
heterologous gene-containing recombinant plasmid Obtained by 
growing the transf ormant . Moreover., where, the gene is properly 
inserted with reference to portions of the plasmid which govern 
the -transcription and translation of the encoded DNA message, 
the resulting plasmid or "expression vehicle", when 
incorporated into the host microorganism, directs the 
production of the polypeptide sequence for which the inserted 
gene codes, a process referred to as expression. 

Expression Is initiated in a region known as. the promoter 
which is recognized by. and .bound by RNA polymerase. • In some 
cases, as.. in the trp operon discussed infra , promoter 
regions are overlapped by "operator" regions to form a combined 
promoter-operator. Operators are DMA, sequences which are 
recognized by so-called repressor proteins which serve to 
regulate the frequency of transcription initiation at a 
particular promoter. The polymerase travels along the DNA, 
transcribing the information contained in the coding strand 
from its 5' to 3' end into messenger RNA which is 1n turn 
translated into a polypeptide having the amino acid sequence 
for which the DNA codes. Each amino acid Is encoded by a 
unique nucleotide triplet or "codon" within what may for 
present purposes be referred to as the 'structural gene", i.e. 
that part which encodes the amino acid sequence of the 
expressed product. After binding to the promoter, the RNA 
. polymerase first transcribes nucleotides encoding a ribosomc 
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binding site, then a translation initiation or "starf signal 
[ordinarily ATG, which in the resulting messenger RNA becomes 
AUG ), then the nucleotide codons within the structure gene 
Itself. So-called stop codons are transcribed at the end of 
the struc'.ural gene whereafter the polymerase may form an 
additional sequence of messenger RNA which, because.of the 
presence of the stop signal, will remain untranslated by the 
Hbosomes. Riboson.es bind to the binding site provided on the 
BesS enger «ma ; in bacteria ordinarily as the mRNA is being 
formed, and themselves produce the encoded polypeptide, 
beginning at the translation start signal and ending at the 
previously mentioned stop signal. The desired product is 
produced if the sequences encoding the ribosome binding site 
are positioned properly with respect to the AUG initiator codon 
and if all remaining codons follow the initiator codon in 
phase. The resulting product may be obtained by lysing the 
host cell and recovering the product by appropriate 
purification from other microorganism protein. 

Polypeptides expressed through the use of recombinant OUA 
technology may be entirely heterologous, as in the case of the 
direct expression of human growth hormone, or alternatively may 
comprise a heterologous polypeptide and. fused thereto, at 
least a portion of the amino acid sequence of a homologous 
peptide, as in the case of the production of intermediates for 
somatostatin and the components of human insulin. In the 
latter cases, for example, the fused homologous polypeptide 
comprised a portion of the amino acid sequence for beta 
galactosidase. In those cases, the intended bioactive product 
is bioinactlvated by the fused, homologous polypeptide until 
the latter is cleaved away in an extracellular environment. 
Fusion proteins like those just mentioned can be designed so as 
to permit highly specific cleavage of the precursor protein 
from the intended product, as by the action of cyanogen bromid. 
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on methionine, or alternatively by enzymatic cleavage. See, 
eg., G.B. Patent Publication Ho. 2 007 676 A. 

Hunan insulin has hitherto been obtained employing 
techniques of recombinant DNA technology. The process used a 
synthetic gene for the A chain which Is expressed 1n E. coll 
and a separate synthetic gene for the B chain whicn is 
expressed In another E. coU_. D.V. Goeddel et aK , Proc. Nat. 
Acad. Sci . , USA, 76 , 1 06 [1579). The two chains are obtained 
as chimeric polypeptides (proteins) comprising the desired 
sequence of amino acids (either the A or B chain sequence) 
bound to another section of carrier polypeptide designed to 
protect the desired sequence from proteases in the E. coli . 
The chimeric proteins have a selective cleavage site adjacent 
the desired polypeptide sequence of the A or B chain which 
permits separation of the desired sequence from the carrier 
polypeptide. Isolation of the two sequences is followed by the 
formation in vitro of the disulfide bonds. 

This process is necessarily complicated by the fact that 
two distinct genetically modified bacterial strains must be 
obtained and maintained. Further, the prior process requires 
separate isolation of the A and B chain and the crosslinxing of 
the two chains by means of the formation of the disulfide bonds 
without the aid in orientation provided by an intact C chain. 

The present application describes a process for the 
construction of a single gene to express, 1n a single 
microorganism, a chimeric protein which includes a complete 
human proinsulin polypeptide or an analog thereof differing 
from human insulin in only the amino acid sequence of the C 
chain. Human Insulin can be cleanly excised frora these 
polypeptides after 1n vitro formation of the disulfide 
crosslinks between the A and B chains. 
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By this process, proinsulin, and analogs thereof, can be 
directly obtained in substantially pure form and free of 
biologically active impurities. Similarly, the proinsulin* can 
be effectively processed to obtain insulin chemically identical 
to human insulin similarly free of biologically active 
impurities thus promising a more effective treatment of human 
diabetes than possible using animal derived insulin. 

Summary of the Invention 

The present invention provides a method for obtaining huraan 
insulin by means of a chimeric polypeptide ccnpri sing the polypeptide 
sequence of human proinsulin, or an analog thereof differing frcm the 
polypeptide sequence of human proinsulin in the sequence of amino 
acids comprising the C chain, fused to additional protein or 
protein fragment, there being a selective cleavage site which 
permits cleavage of the proinsulin or its analog from the additional 
protein or fragment. The cleaved proinsulin product may then be 
caused to orient by formation of the character istic insulin A 
and b chain disulfide crosslinks and the crosslinked insulin 
'precursor may then be excised from the "C" chain carrier. 

The "C" chain (or, hereinafter, bridging chain) of amino 
acid units of the analog proinsulins made according to the 
present invention may comprise as few as 2 amino acid units. 
The identity and sequence of amino acid units intermediate the 
ends of the bridging chains in analogs of human proinsulin are 
not particularly significant. However, the end units thereof 
must be units which permit facile excision of the bridging 
chiin from the A and B chains of human insulin. Preferably, 
excision occurs after the proinsulin molecule has been cleaved 
from the addition protein of the chimeric polypeptide and, most 
preferably, after the thus cleaved proinsulin molecule has been 
folded and the disulfide links between the A and B chains 
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characteristlc of human insulia have been formed. 
Preferably, the bridging chain has sites which permit its 
excision by enzymatic means. Preferred for this-, purpose 
are Arg-Arg and Lys-Arg units on the. bridging chain which are 
adjacent the terminal -COOH end of the B chain a*d terminal 
-NH 2 end of the A chain, respectively, as Is foujid in human 
proinsulin itself. Also preferred are two Arg-Arg units 

k a ty gantheB^ndAchains. 

The chimeric proteins of the present invention are obtained 
by expression of a heterologous structural gene for the 
proinsulin or analog in a recombinant microbial cloning vehicle 
in which the gene is in reading phase with a DNA sequence 
coding for an addition protein portion of the chimeric protein 
and -the cleavage site. In preferred embodiments of the 
invention, the cleavage site 1s methionine at the N-terminal of 
the proinsulin which permits cleavage using cyanogen bromide. 

The additional protein can vary but the. preferred 
additional protein Is methionine amino acid or the presequence 
of preproinsul in or a portion thereof or B-gal actos i das e 

or a substantial portion thereof or a portion of the amino acid 
sequence encoded by a fragment of the trp leader polypeptide 
gene fused to a portion of the trp E polypeptide gene or the 
trp D polypeptide. The added, ultimately superfluous portion 
of the chimeric protein is selected to provide protection from 
bacterial proteases which might otherwise digest the proinsulin. 

The preferred recombinant microbial cloning vehicle is a 
modified, preferably bacterial, plasmid containing the 
structural gene in a reading phase with the ON A sequence which 
codes for the added, superfluous portion of the chimeric 
polypeptide and the selective cleavage site. 

The manner in which these and other objects and advantages 
of the invention are achieved will be apparent to those skilled 
in the art after consideration of the following description of 
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the preferred embodiments and the illustrations of Figs. 1-6. 

Brief Description of the Drawings 

Figure 1 shows the nucleotide sequence of a gene for human 
proi nsul 1n . 

Figure 2 illustrates a scheme for obtaining a plasmid 
containing a fragment of the gene of Figure 1 which was derived 
using reverse transcription from mRNA. 

Figure 3 Illustrates * s c ^ e m ? for obtaining * plasmid 
containing the gene of Figure 1 su* ble for transformation 
into, e.g., £. col 1 for expression of human proinsulin. 

Figure 4 illustrates the analysis by HPLC chromatography of 
human proinsulin obtained by expression from an, e.g., £. col i 
transformant containing the plasmid derived from the scheme of 
Figure 3. 

Figure 5 illustrates segments of a gene for expression of 
an analog of human proinsulin differing from human proinsulin 
in the amino acid sequence of the C, bridging chain. 

Figure 6 illustrates a scheme for assembling a plasmid 
containing a gene for transformation into, e.g., £. col 1 for 
expression of an analog of human proinsulin. 

Detailed Description 

A. Preparation of Human Proinsulin 

1. Preparation of Synthetic Gene Coding For The 32 
N-Term1nal Amino Acids of Proinsulin 
a. Oligonucleotide Synthesis 

A series of 18 oligonucleotides, short nucleotide 
chains 10-12 units in length shown in Table 1, were prepared as 
a first step to the construction of a gene coding for the first 
32 amino acids of proinsulin, the amino acid sequence of which 
is shown In Fig. 1 as a part of the nucleotide sequence of the 
entire gene ultfnately constructed for use in the present 
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Invention for the expression of human proinsulln. The 
Individual nucleotides In the gene are Identified by the 
letters A, T, C or G representing the bases adenine, thymine, 
cytosine or guanine which distinguish one nucleotide from 
another. 

TABLED 

01 igonucl eotides: 

HI 
H2 
H3 
H4 
H5 
H6 
H7 
H8 
Bl 
82 
B3 
84 
B5' 
B6 
87 
B8 
B9 

BIO' 




AATTCATGTT 

CGTCAATCAGCA 

CCTTTGTGGTTC 

TCACCTCGTTGA 

TTGACGAACATG 

CAAAGGTGCTGA 

AGGTGAGAACCA 

AGCTTCAACG 

AGCTTTGTAC 

CTTGTTTGCGGT 

GAACGTGGTTTC 

TTCTACACTCCT 

AAGACTCGCC 

AAC AAGGTACAA 

ACGTTCACCGCA 

GTAGAAGAAACC 

AGTCTTAGGAGT 

GATCCGGCG 
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The synthetic nucleotides are shown between 
brackets in Fig. 1. These oligonucleotides were 

synthesized by the triester method: R. Crea et aK , Proc. Hat. 
Acad. ScK, USA, 75, 576S (1978), K. Itakura et J. Biol. 

Chem., 250, 4592 (1975) and K . Itakura et a_K , 0. Am. Chem. 
Soc, 97 , 7327 (1975 ). Some of these are oil gonucl eoti des 
which were used in a gene coding for the B chain of human 
insulin previously described by Crea et aK , Proc. Nat. Acad. 
Sei-, USA : T±, 5765 (1973 ). and Goeddel et aK Proc. Nat. Acad. 
Sci., USA, 76, 106 (1979). The nucleotide sequences of two 
synthetic nucleotides (B5 f and BIO'), were synthesized for this 
project; the others were prepared according to Crea el aK 
(supra.) The two new oligonucleotides, also prepared according 
to Crea et aK , incorporate restriction enzyme recognition 
sites for Hp_al I and terminal BamHI, the latter used for 
cloning. The other end of the gene contains a sticky end of an 
Eco RI site for cloning purposes. 

b. Joining of Synthetic Oligonucleotides 

The eight oligonucleotides H1-H8 were used 
previously to construct the left half of the B chain gene. This 
was used in this process and is described by Goeddel et aK , 
Proc. Natl . Acad. ScK USA 76, 106 (1979). It contains the 
codons for the 1-13 amino acids of the B chain gene and a 
methionine unit at the N-terminal, used later to cleave the 
proinsulin from bacterially expressed chimeric protein using 
cyanogen bromide (CNBrl. 

The right half of the B chain gene was obtained 
from the oligonucleotides B 1 , B 2 , B3 , B 4 , B 5 ' ( B 6 , 
B ?t B gt B 9 and B ln \ by ligation using T ; ligase anu 
a technique described by Goeddel e_t aK (supra). The gene 
fragment produced codes for the 14-30 amino acid units of the B 

chain and the first unit, Arg, of the bridging chain. 

— ' — ' 
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Incorporated into the gene sequence Is an Hoall restriction 
enzyme site In the same reading frame and location as an Hoall 
site in the human Insulin gene. After purification of the 
ligated gene fpajaent by polyacryl ami de gel electrophoresis. 
and elution of the largest DNA band, the fragment was inserted 
into the plasmid pBR322 that had been cleaved with restriction 
endonucleases H_i_nd_I 1 1 and BamHI, thereby utilizing the Hindi II 
,„d BamHI sites on the synthetic gene fragment. The DNA was 
inserted into £. coU 294 (ATCC Mo. 31445} by transformation. 
One plasmid. pB3' recovered from an anpicillin resistant, 
tetracycline sensitive clone was found to possess the desired 
nucleotide sequence according to the method of A.M. Maxam 
et «]_. Prnr. Natl. Acad. Sci USA^i- 560 C^ 77 '- 

From the two plasmids pBH 1 and pB3\ two DMA 
fragments were recovered, a 46 base pair EcoRi to Hind I II 
fragment from pBHl. and a 58 base pair Hindi 1 1 to BamHI 
fragment from pB3 ' . 

The two fragments were ligated together to produce 
a fragment having an EcoRl site and- a BamHI site. This 
fragment was Inserted In plasmid P BR3Z2 which had been treated 
with EcoRI and BajnHI restriction endonucleases using the method 
described In Goeddel et aK Proc. Hat. Acad. Sci^JISA, 76, 106 
(1979) and Cloned In E. co_H K-1Z strain 294 (ATCC No. 31446) 
to provide the plasmid pIB3. After cloning, the plasmid pIB3 
was cleaved with EcoRl and Hpa 1 1 restriction endonucl eas es to 
recover the synthetic gene fragment (Fragment 1. Figure 3) 
containing the eodons for the N-terminal proinsulin amino acids 
preceded by a methionine codon as shown in Figs. 1 and 3. The 
synthetic gene was isolated by polyacryl amide gel 

el ec trophores i s . 

2. Isolation of A cONA GeneCoding For the 55 C-Terminal 
Amino Acids of Human Proinsulin 
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The scheme for obtaining the cONA gene is schematically 

shown In F 1 g - 2. 

A decanucleotlde was synthesized containing the 
recognition sequence for BamHI endonucl eases . to which was added 
a 3' polythymldyllc add tract of approximately 20 residues. 
Its sequence is pCCGCATCCGGTT 1 g T . This oligonucleotide was 
used to prime AMV reverse transcriptase for cDNA synthesis. 

The primer was prepared using terminal deoxynucl eo ti dyl 
transferase (Enzo Blochera, 200 units) with one u mole of the 
BamHI deca nucl eo ti de in a reaction volume of 0.6 ml containing 
1.5 x 10~ 4 TTP. The reaction was conducted at 37*C for one 
hour in a buffer system described by A. Chang et ajk Nature , 
275 , 617 (1978). 

Human insulinoma polyA tissue (2.5 yg) provided by the 
Institute fuer D1 abe tes f orschung , Muenchen, West Germany 
(Dr. Wolfgang Keraniler) containing mRNA isolated by the process 
of Ullrich et al, Science , 196 , 1313 (1977 ) was converted to 
double stranded cDNA by a procedure according to Wickens et al_. 
J. Biol. Chem, 253 , 2483 (1978). Thus, 80 pi containing 15 mM 
Tris/HCl (pH 8.3 at 42*C), 21 mM KC1 , 8mM HgCl 2l 30 mH 
B-mercaptoethanol , 2 mM of the primer dCCGGATCCGGTT^gT , and 1 
oM dNTPs was preincubated at 0*C. Then 40 units of AMV reverse 
transcriptase were added and the mixture incubated for 15 
mi nu tes at 42"C . 

The complementary cDNA strand was synthesized in a 
volume of 150 ul containing 25 raM Tris/HCl ( pH 8.3), 35 mM KC1 f 
4 mM MgCl 2 , 15 mM p-raercap toethanol and 9 units of 0»A 
polymerase I (Klenow fragment). The mixture was incubated at 
15*C for 90 minutes followed by 15 hours at 4*C. SI nuclease 
digestion was then performed for 2 hours at 37 # C using 1000 
units of SI nuclease (Miles Laboratories) as described by 
Wlckens et aJK supra . The double stranded cDNA (0.37 pg ) was 
subjected to electrophoresis on an 8 percent pol yacryl ami dc 
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gel. UNA fragments larger than 500 base pairs were cluted. 
01 igodeoxycytidyl ic acid residues were added to the 3' ends of 
the fragments using terminal deoxynucl eo t1 dyl transferase by 
the procedure of J.Y. Maizel Jr., Heth. Virol. , 5, 180 (1971). 
The dC tailed cDNA fragments were annealed to pBR322 that had 
been cleaved with the restriction endonuc 1 ease Ps t l and tailed 
with deoxyguam'dyl 1c acid using terminal deoxynucl eo t i dy 1 
transferase. The resulting plasmids were transformed Into 
£. col 1 K-12 strain 294 and cloned. Colonies resistant to 
tetracycline but sensitive to ampicillin were isolated and 
screened* for plasmids having three sites cleavable by the 
restriction endonuclease Ps t l indicative of the presence of the 
gene for insulin. Sures et aj_. Science , 208 , 57 (19S0). 

One plasmid, pHI104, containing a 600 base pair insert 
and giving the anticipated PstI restriction pattern was 
determined to contain a site cleavable by Bam HI between the 3' 
polyA and the polyGC introduced during the cOHA preparation. 
Some of the nucleotide sequence of the insert is shown in 
Fig. 1. This sequence differs slightly from that previously 
reported by I. Sures et aK Science , 208 , 57 ( 1980) and G. 
Sell, et aK Mature , 282 , 525 (1979 ], having an AT base pair 
where underlined rather than a CG pair, because the mRN A used 
was from tissue isolated from a different individual. The 
resistance to antibiotics conferred on a bacterium by this 
plasraid is indicated by the marker Ap S for ampicillin 
sensitivity and Tc r for tetracycline resistance. 

3. Assembly Of A Gene Coding For Human Proinsulin 

The scheme used for assembling a gene coding for human 
proinsulin is shown in Fig. 3. 

The synthetic gene segment coding for the first 31 
amino acids of proinsulin, fragnent 1 1n Fig. 3, was recovered 
from 50 yg of the plasmid p 1 B 3 using the restriction endo- 
nucleases £c o RI and Hoall as described above. This fragment 
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also contains the codon ATC for methionine in place of the 
• "presequence" of preprol nsul i n . Introduction of a methionine 
unit at this point permits the polypeptide ultimately expressed 
to be cleaved at this point by cyanogen bromide (CNBr) to 
separate the proinsulln from the residue of the polypeptide 
which served to protect the proinsulln portion from bacterial 
proteoses. 

The cDMA gene segment coding for amino acids 32-86, as 
well as the translation stop codons and the 3' untranslated 
region of the mRNA was recovered from 40 y g of the plasmid 
pH 1 1 04 by treatment first with Bam HI and then Hpa l I as shown in . 
Fig. 3 as fragment 2. 

The two fragments were isolated by polyacry 1 ami de 
electrophoresis followed by el ec troel u ti on . The gene fragments 
were joined by treatment with T4 DMA Hgase in 20 w l ligase 
buffer (Goeddel et al_. Proc. Mat. Acad. Sci.. USA, 76. 106 
(1579) at 4'C for 24 hrs. The mixture was diluted with 50 pi 
K 2 0, extracted with phenol, then chloroform and then 
precipitated with ethanol. 

The resulting DMA was treated with BamHI and EcoRI to 
regenerate these sites and remove gene polymers. The assembled 
proinsulin gene was isolated by p o 1 y aery 1 ami de gel electro- 
phoresis and Mgated using T4 ligase to the plasmid pBR322 
which had previously been treated with Eco RI and BaraHI . The 
resulting DMA was transformed into I. coli K-12 strain 294 and 
cloned. Colonies were screened using the plasmid conferred 
antibiotic resistance markers.* The desired clones were tetra- 
cycline- s ens i ti ve (Tc s ) and ampicil 1 in-resistant (Ap r ). 
Plasmid pHl3 was isolated from one such colony and the 
proinsulin was characterized by nucleotide sequence analysis 
and found to have the sequence shown in Fig. 1. 

4. Construction of a Plasmid Designed to Express a 
Chicieric Protein Containing the Human Proinsulin Peptide 
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Plasraid pBRHl. (R.I. Rodriguez, £t aK , Nucleic Acids 
Research 6. 3257-3287 (1979) expresses anpicillin resistance and 
contains the gene for tetracycline resistance but. there being 
no associated propter, does not express that resistance. The 
plasmid is accordingly tetracycline sensitive. By Introducing 
a promoter-operator system in the EcoRI site, the plasmid can 
be made tetracycline resistant. 

Plasmid pGHl carries the E. coll tryptophan operon 
containing the deletion LE1413 (G.F. Hicizari. et al_. . (1978) 
J. Bacteriology 1457-1455)) and hence expresses a fusion 
protein comprising the first 6 amino acids of the trp leader 
and approximately the last third of the trp E polypeptide 
(hereinafter referred to in conjunction as IE'), as well as the 
trp 0 polypeptide in its entirety, all under the control of the 
trp promoter-operator system. The plasmid. 20 »g, was digested 
with the restriction enzyme Pvull which cleaves the plasmid at 
five sites. The gene fragments 2 were next combined with EcoRI 
linkers (consisting of a self complementary oligonucleotide 3 
of the sequence: pCATGAATTCATG) providing an EcoRI cleavage 
site for a later cloning into a plasmid containing an EcoRI 
site. The 20 v<3 of DMA fragnents 2 obtained from pGHl were 
treated with 10 units of T 4 DMA ligase in the presence of 200 
pico raoles of the 5 1 -phosphoryl ated synthetic oligonucleotide 
pCATGAATTCATG and 1n 20 v \ T 4 DNA ligase buffer (20mH tris, 
pH 7.6, 0.5 mH ATP, 10 mH HgCl 2 , 5mH di thi othrei tol ) at 4'C 
overnight. The solution was then heated 10 minutes at 70*C to 
halt ligation. The linkers were cleaved by EcoRI digestion and 
the fragments, now with £coRI ends were separated using 5 
percent pol yacry 1 ami de gel electrophoresis (herein after 
■PAGE") and the three largest fragments isolated from the gel 
by first staining with ethidium bromide, locating the fragments 
with ultraviolet light, and cutting from the gel the portions 
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of Interest. Each gel fragment, witn 300 mi cro 1 1 ters O.lxTBE, 
was placed in a dialysis bag and subjected to electrophoresis 
at 100 v for one hour in O.lxTBE buffer (TBE buffer contains: 
10.8 gm tris base, 5.5 gm boric add, 0.09 gm Na 2 E0TA In 1 
liter H 2 0). The aqueous solution was collected from the 
dialysis bag, phenol extracted, chloroform extracted and made 
0.2 M NaCl, and the DNA removed in H 2 0 after EtOH 
precipitation. 

pBRHl was digested with EcoRI and the enzyme removed by 

phenol extraction followed by chloroform extraction and 

recovered in water after ethanol precipitation. The resulting 

DNA molecule was, in separate reaction mixtures, combined with 

each of the three DNA fragments obtained above and ligated with 

T DNA ligase as previously described. The DNA present in 
4 

the reaction mixture was used to tranform competent E. coH 
K-12 strain Z94, K - Backman et aK t Pcoc Nat'l Acad Sci USA 73, 
4174-4198 [1976]) (ATCC no. 31446) by standard techniques (V. 
Hershfield et aj_. , Proc Nat'l Acad 5ci USA 71 . 3455-3459 
[1974]) and the bacteria plated on LB plates containing 
20 ng/ml ampicillin and 5 P g/ml tetracycline. Several 
tetracycline-resistant colonies were selected, plasmid ONA 
isolated and the presence of the desired fragment confirmed by 
restriction enzyme analysis. The resulting plasmid was 
designated pBRHtrp. 

P BRH trp was digested wih EcoRI restriction enzyme and 
the resulting fragment isolated by PAGE and el ec troelut i on. 
EcoRI-digested plasmid pSOMall (K. Itakura et al., Science 198, 
1056 (1977); G.B. patent publication no. 2 007 676 A) was 
combined with this fragment. The mixture was ligated with T 4 
DNA ligase as previously described and the resulting DNA 
transformed into E. coll K-12 strain 294 as previously 
described. Transformant bacteria were selected on 
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ampidl 1 in-containing plates. Resulting ampi c i 1 1 1 n- res i s tan t 
colonies were screened by colony hybridization (M. Gruenstein 
et al_. ( Proc NatM Acad Scl USA TZ. 395 1-3965 [1 975]) using as 
a probe the trp promoter-operator-containing fragment Isolated 
from pBRHtrp, which had been ra d 1 oac ti v el y labelled with 
P 32 . Several colonies shown positive by colony hybridization 
were selected, plasmid DMA was Isolated and the orientation of 
the inserted fragments determined by restriction analysis 
employing restriction enzymes Bglll and 8amHI1n double diges- 
tion. E. col i 29 4 containing the plasmid designated pS0M7a2. 

Plasmid pBR322 was Hindlll 'digested and the protruding 
Hindi X I ends in turn digested with SI nuclease. The SI 
nuclease digestion involved treatment of 10 ug of 
Hindlll-cleaved pBR322 In 30 ^1 SI buffer (0.3 M NaCl, 1 mM 
ZnCI^, 25 raH sodium acetate, pH 4.5) with 300 units SI 
nuclease for 30 minutes at 15*C. The reaction was stopped by 
the addition of 1 yl of 30 X SI nuclease stop solution (0.8M 
trts base, 50 mM EDTA). The mixture was phenol extracted, 
chloroform extracted and ethanol precipitated), then EcoRI 
digested as previously described and the large fragment 1* 
oDtained by PAGE procedure followed by el ectroel ution. The 
fragment obtained has a first EcoRI sticky end and a second, 
blunt end. whose coding strand begins with the nucleotide 
thy mi dine. 

16 p g Plasmid pS0M7fi2 was diluted into 200 u l of buffer 
containing 20 mM Tris, pH 7.5, 5 mM MgCl 2 , 0.02 percent NP40 
detergent, 100 mM NaCl and treated with 0.5 units EcoRI . .After 
15 minutes at 37*C, the reaction mixture was phenol extracted, 
cnloroform extracted and ethanol precipitated and subsequently 
digested with Bgl II. The larger resulting fragment 3' was 
isolated by the PAGE procedure followed by el ec tro e 1 u t i o n . 
This fragment contains the codons "LE'(p)" for the proximal end 
of the L E ' polypeptide, I.e., those upstream from the Bgl II 

f ^TW-rv •*«-*- ^ „ 
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site. The fragment 2_* was next Hgated to the fragment 4*. 
Fragment 4/ is prepared by successive digestion of pTrp24 
(prepared upon EcoRI digest of pThal ( 8 1 ocheni s try 80. 6096 
f 1980 ) } followed by Klenow polymerase I reaction to blunt the 
EcoRI residues. Bgl II digestion creates a linear fragment which 
was ^circularized by reaction with the LE 1 containing Bgl II 
Sticky and blunt ends) with Bglll and EcoRI, followed by PAGE 
and electroelution. The ligation was done in the presence of 
T^ DKA ligase to form the plasmid pS0M7a2a4, which was 
transformed into E. col i strain 294, as previously described. 

Plasmid pSOM7 A 2 was Bgl II digested and the Bgl II 
sticky ends resulting made double stranded with the Klenow 
polymerase I procedure using all four deoxynucl eoti de 
triphosphates. EcoRI cleavage of the resulting product 
followed by PAGE and electroelution of the small fragment 2' 
yielded a linear piece of DNA containing the tryptophan 
promoter-operator and codons of the LE ' "proximal" sequence 
upstream from the Bgl II site ( "IE ' ( p } " ) . The product had an 
EcoRI end and a blunt end resulting from filling in the Bgl II 
site. However, the Bgl II site is reconstituted by ligation of 
the blunt end of the fragment 2/ to the blunt end of fragment 
1/. Thus, the two fragments were Hgated in the presence of 
T 4 DNA ligase to form the recircul ari zed plasmid pHKY 10 
which was propagated by transformation into competent E. coli 
strain 294 cells. Tetracycline resistant cells bearing the 
recombinant plasmid pHKY 10 were grown up, plasmid DNA 
extracted and digested in turn with Bgl II and Pst followed by 
Isolation by the PAGE procedure and electroelution of the large 
fragment, a linear piece of DNA having Pst and Bgl II sticky 
ends to give DNA fragment _7 ' . 

Plasmid pS0M7a2a4 could be manipulated to provide a 
second component for a system capable of receiving a wide 
variety of heterologous structural genes. The plasmid was 
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subjected to partial EcoRI digestion followed by Pst digestion 

and the fragments containing the trp promoter/operator was 

Isolated by the PAGE procedure followed by el ec troel uti on . 

Partial EcoRI digestion was necessary to obtain a fragment 

which was cleaved adjacent to the 5' end of the somatostatin 

gene but not cleaved at the EcoRI site present between the 

araplcllltn resistance gene and the trp promoter operator. 

p 

Ampicillin resistance lost by the Pst I cut 1n the ap gene 
could be restored upon ligation with fragment £ ' . 

In a first demonstration the third component, a 
structural gene for somatostatin (6*) was obtained and purified 
by PAGE and el ectroel ution. 

The three gene fragments V , and £' could now be 
ligated together in proper orientation, to form the plasmid 
S0M7aU4. 

The complete human proinsulin gene, including the 
N-terminal codons that code for methl on i ne, wa s recovered from 
the plasmid pHI3 by treatment with Eco RI and Bam HI and purified 
by gel electrophoresis. This gene, fragment 3_ in Fig. 3, was 
joined to two other DMA fragments with T4 ligase; these are 
identified as fragments £ and £ in Fig. 3. 

Fragment 4^ contains a promoter and a carrier protein 
gene derived from the plasmid pS0M7^1^4 by partial digestion 
with Eco RI and complete digestion with P s t l . This fragment 
contains an £. col 1 tryptophan (trp) promoter-operator, nine 
codons from the trp leader peptide, 190 codons from the trp E 
gene and an Eco RI cleavage site introduced in place of the trp 
E termination codon. (This gene construction will be referred 
to as trp LE' below.) The tryptophan attenuator region 
Including the last 5 codons of the trp leader peptide sequence 
and the first two thirds of the trp E gene are deleted in this 
constructl on . 
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The trp £ gene (trp IE * } , contained In Fragment 4' , is 
modified to incorporate an Eco RI site in place of the 
termination codon of the trp E gene as shown to give the 
correct reading frame with the inserted gene fragment 3. 

This fragment Is bounded at the opposite end by a 
PstI site derived from the p8R322 and incorporates the first 
half of the 8-1actamase gene. The fragment was recovered from 
20 v g of plasmid pSOM7aU4 by partial digestion with Eco RI 
followed by treatment with Ps t l . The promoter containing 
fragments were isolated by polyacryl ami de gel electrophoresis. 

Fragment 5' was obtained from plasmid pHKYlO. This 
plasmid is a derivative of pBR322 and contains a tryptophan 
promoter-operator in place of the tetracycline promoter. The 
Hindlll site of pBR322 has been converted to a Sgl II site. The 
plasmid, 20 yg, wastreated wtih Pst I and Bgl 1 1 and the large 
fragment, designated 5_ in Fig. 3, purified by poly ac ryl ami de 
gel electrophoresis. 

The two fragments £ and £ w»re ligated together to 
reassemble the gene for a-lactamase via a Pst I site and confer 
ampiclllin resistance (Ap r ). The ends then present an EcoRI 
site and a Bgl II site for insertion of a gene. These two sites 
cannot be ligated together due to nonhybri di za tion of the 3' 
protruding ends and can only be Joined by incorporating a DMA 
fragment that possesses 3 1 ends complementary to the Eco RI and 
Bgl 1 1 ends. The proinsulin gene ( fragment 3 containing Eco RI 
and Bam HI ends, is such a molecule. Thus, the three fragments, 
5 u g of 4 f 1 ug of 3 and 1 ng of 5 were combined and treated 
with T4 DNA ligase at 4"C for 24 hours in ligase buffer. Upon 
ci rcul ari za ti on to close the plasmid, the tryptophan promoter- 
operator controls expression of a fusion (chimeric) protein of 
which proinsulin is a portion. Tetracycline resistance (Tc r ) 
is also conferred*. 
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The DNA mixture from the ligation was transformed in 
£. coll , K-12 strain 294 by the procedure of Goeddel et al . , 
Nature , 231 , 541 (1979). Colonies were selected that would 
grow on both ampicillin and tetracycline. Of 3 colonies 
tested, 2 were found by SOS pol yacry 1 ami de gel electrophoresis 
(J.F. Maizel, Jr., Meth. Virol . , 5, 180 (197 1 )) to express a 
protein of the molecular weight expected of the trp IE'- pro- 
insulin fusion. One plasmid, pH 17 was completely characterized 
as to DHA sequence of the incorporated gene and restriction 
analysis of the vector pBR322. 
5. Pro insulin Isolation 

The plasmid pH 1 7 was transformed into £. c ol i K-12 
strain RY308 (ATCC No. 31608) and grown in 500 ml of LB medium 
(J.H. Miller, Experiments In Molecular Genetics, 433, Cold 
Spring Harbor 1972) containing 10 yg/ml of ampicillin to a cell 
density of 5 00. This was diluted into a 10 liter fermentation 
vessel (New Brunswick) and grown in M9 media (Miller, supra , at 
431.) to a cell density of 14 OD. Cells were collected by 
cen tri f u ga ti on and frozen. 

Cells (164g) were thawed in 5 volumes sucrose lysis 
buffer (10 percent sucrose, 0.1M tris HC1 , pH 7.9, 50mM EDTA , 
0.2M HaCl ) containing O.lmM phenyl me thy 1 sul fonyl fluoride and 
1.0 mH dimercaptopropanol , and lysed by sonication. The lysis 
pellet was collected by cen tri f uga ti on and suspended by 
stirring overnight at 4*C with 4 volumes of 7.0M g u an i d i n e-HC 1 , 
ImM dimercaptopropanol, ImM EDTA. After cen tri f uga ti on the 
supernatant was diluted 20 times with cold water and allowed to 
stand 2 hours at 4*C. The precipitate ( 9 . 6 g dry) was collected 
by cen tr 1 f u ga ti on and reacted overnight at room temperature in 
220 ml 88 percent formic acid with 5g CNBr to cleave the 
proinsulin from the trp LE* fusion. After rotary evaporation 
the residue was suspended in 200 ml 7.5M urea, ImM EDTA, 20nM 
ammonium carbonate, and the pH adjusted to 9.0 with 
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ethanolamine (5.5 ml). Ten grams of sodium sulfite and five 
grams of sodium te tra thi onate were added to. convert cysteines 
and cystines to S-sulfonate groups and the reaction stirred at 
room temperature for 5 hours. 

The reaction mixture was desalted on a G-25 Medium 
column in 7.5M urea, lOnM Tris pH 8,5, lOmM £DTA. The desalted 
protein was loaded onto a DEAE-s epha dex (A-25) column and 
eluted with a 2-liter linear gradient of 0 to 0.5 M NaCl in the 
same tris-urea buffer. The proinsulin like material, 
identified by R I A or HPLC {see b e 1 c v } , was concentrated on an 
Amicon YM5 membrane and resolved in the tris-urea buffer on a 
G-50 Medium column. The G-50 fractions, identified by HPLC, 
were pooled (104ml) and the buffer charged on a column of G-25 
Fine equilibrated with 30 raM ammonium carbonate, pH 8.8. The 
lyophilized protein weighed 216 mg. The recoveries at each 
step are shown in Table 2. 
6. Proinsulin Analysis 

The S-sul f ona ted proinsulin obtained was analyzed by 
amino acid analysis. This analysis was made by Eli Lilly and 
Co. and is shown in Table 3. 

TABLE 3 



Amino Acids Amino Acid* Amino Acids Amino Acids 

Calculated Predicted Cal cul a ted Predicted 



As p 


4.40 


4 


He 


1 .34 


2 


Thr ' 


2.90 


3 


Leu 


12.21 


12 


Ser 


4 .50 


5 


Tyr 


3 .93 


4 


Gl u 


15.64 


15 


Phe 


2.61 


3 


Pro 


3.42 


3 


His 


2 .02 


2 


Gly 


11 .08 


11 


Lys 


1 .96 


2 


Al a 


4 .46 


4 


Arg 


3.92 


4 


Cys 


2 .85 


6 


Val 


5.58 


6 
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The presence of proinsulin was also confirmed by radio- 
immunoassay. To determine the radi oi mmunoactivi ty the Corning 
** I-1nsul1n kit was used. The antibody was found to be about 
4 percent cross-reactive wi.th prolnsulin and about 0.2 percent 
cross-reactive with reduced proinsulin. Unknowns were heated 2 
minutes at 90*C, in 7.5H urea, 2 mM B-mercaptoe thanol , pH 8-10 
( e thanol amine ) , and aliquots diluted in phosphate-buffered 
saline {0.1 gelatin) and immediately assayed. These results 
were determined from comparisons to a reduced proinsulin 
standard curve generated in the same way from either bovine 
p r c i r» s u I 1 m o r h*u tn a n p r c i n 5 u 1 i n 3 - i u 1 f o n a t e • 

The proinsul i n-S-sul fonate was also assayed by HPLC. In 
Fig. 4 profiles are shown for S-sulfonated Bovine proinsulin, 
bacterial derived human proinsulin and a combination of the 
two. In this analysis, samples of bovine, and human proinsulin 
sulfonate and a mixture of the two were applied to a 10 m 
RP-189 column and eluted using a linear gradient of 21 to 33 
percent n-propanol and acetonitrile (2:1) in 50 mM^HH OAc 
£pH7). The proteins are seen to run very nearly coincident. 
The large peak at the end of the chromatogram is due to 

rapid changes in the solvent composition and not eluted protein. 

B. Preparation of Proinsulin Analog 

Described below is the synthesis of a gene which codes for 
the expression of an analog of proinsulin comprising the A and 
B chains of human insulin connected by a bridging chain which 
differs from the C chain of human proinsulin in that it 
contains only 6 amino acid units rather than the 35 unit 
polypeptide of human proinsulin shown in Fig.l. Specifically 
the G units are, reading in order from the last unit of the B 
chain to the A chain, Arg-Arg-Gl y-Ser-Ly s-Arg . This sequence 
has the sane end sequences Arg-Arg and lys-Arg as does human 
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prolnsulin thus permitting exclssion of the bridging chain by 
proteolytic means. 

A chain of 6 amino acids 1s an acceptable length for a 
modified (or analog) bridging fc C" chain which will permit 
folding and the subsequent formation of the disulfide 
crosslinks between A and 8 chains characteristic of hormone 
Insulin. However, those skilled in the art will appreciate 
that bridging chains shorter or longer than 6 would also be 
useful as well by permitting folding and the formation of the 
necessary disulfide bonds. Sequences of 100 or even more amino 
acid units can be employed in the bridging chain. However, the 
practical difficulty of obtaining gene fragments coding for 
very long sequences makes the bridging chain analogs of fewer 
than 35 amino acid units more attractive from a practical point 
of v i ew . 

The ends of the bridging chain, no matter how many 
intermediate amino acid units or in what order, must be 
constructed to permit excission of the bridging chain. 
Although alternative means may be employed, we prefer to use 
the sequences Arg-Arg and Arg-lys as found in proinsulin itself 
as proteolytic cleavage using trypsin and carboxypepti dase B 
occurs cleanly at these sites. 

1. Preparation of Synthetic Gene Coding for the 57 Amino 
Adds of an Analog Proinsulin 

a. Oligonucleotide Synthesis 

The chemical synthesis methods as well as the 
synthesis of the OHA gene fragments coding for the A and B 
chains of human insulin have been described. K . Itakura 
et al_. , J . Biol . Chen. , 250 , 4592 ( 1975 ), K . Itakura e_t a]_. 
Biol . Chen . , 250 , 4592 (197 5 ), K . Itakura e_t £K J. Am. Chem. 
Soc. , 97_, 7327 ( 1975 ), Crea et aj_. Proc. Nat. Acad. Scl.. USA, 
7 5 , 5765 ( 1978 ) and Coeddel et aK Proc. Nat. Acad. Sci . , USA. 
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76 , 10G (1979 ). Five new oligonucleotide fragments were 
synthesized by similar methods also using the triester process 
described in the above cited references. These sequences are 
shown below in Tabic 4 and Fig. 5, 

TABLE 4 

01 i g o n u c 1 e o t i d e s 

AAGACTCGTCGTG 
GATCCAAGCG7 ggcatc 
GATCCACGACGAGTCTT 
CAACGATGCCACGCTTG 
TCGACTATTAGTT 

b. Joining of Synthetic Oligonucleotides 

Figure 5 shows the synthetic oligonucleotides of 
the insulin A and B chain genes previously prepared and the 
manner in which the new fragments C^-Cg were used in the 
enzymatic construction of a complete gene for a proinsulin 
analog. The scheme for obtaining this gene is set forth in 
Fig. 6. 

A plasmid pBHl containing the left half of the B 
chain gene was used in this process and is described by Goeddel 
et aK , Proc. Nat. Acad. Sci., USA, 76, 105 ( 1979 ). It 
contains the codons for the 1-13 amino acids of the B chain and 
a methionine unit at the N-terminal which will be used later to 
cleave the proinsulin analog from the bacterially expressed 
chimeric protein using cyanogen bromide (CNBr). 

The gene for the right half of the B chain was 
obtained from the oligonucleotides ji B-, 6 . . d. 

1 ' Z * J * H 1 * 

8 6 and B ? , Bg , Bg and C 3 (Cj and replacing the 
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B 5 and B^q sequences in the previously prepared gene frag- 
ment) by ligation using T4 ligase in conventional techniques. 
This gene fragment codes for the 14-30 amino acid units of the 
B chain and the first two units Arg-Arg of the bridging 
(modi fled "C" ) chain-pBCl. 

After purification by pol ya cry 1 ami de gel electro- 
phoresis and elution of the largest DNA band the fragment was 
inserted into the plasmid pBR322 that had been cleaved with 
restriction en donucl ea s es Hi nd l II and Bam HI t thereby utilizing 
the Hind lll and Bam HI sites on the gene fragment, and cloned in 
£. col i 294 [ATCC No. -31446). The plasmid pBC recovered from 
an ampicill in-resistant, tetracycline sensitive clone possessed 
the desired nucleotide sequence according to the method of A.M. 
Maxam e_t al_. , Proc. Nat. Acad. Sci . , USA, 74, 560 (1977). 

The A gene was constructed similarly from 
oligonucleotides C 2 , A 2 , A 3 , A 4 , A 5 , A fi , C 4 , 

A 8 ' A 9' A 10 P A ll and C 5 ^ C 2» C 4 and c 5 

replacing the A^ , A 2 and A 12 sequences in the previously 
prepared gene fragment) using T4 ligase in conventional 
techniques. This fragment codes for th^ 21 amino acids of the 
A and the four units of the bridging'C chain, Gl y-Ser-Lys-Arg . 
After purification, also by pol yacryl ami de electrophoresis, the 
fragment was inserted into the plasmid pBR322 which had been 
cleaved with restriction endonucl eases EcoRI and Sail using the 
EcoRI and Sai l sites on the fragment and cloned in E. col i 
294. An ampicill in-resistant, te tracy c 1 i ne-s en s i t i v e clone 
yielded plasmid pCA18 having the desired nucleotide sequence by 
the method of A.W. Maxam et a_K , supra . 

2. Construction of the Proinsulin Analog Gene and 
Corresponding Expression Plasmid 

The desired expression plasmid was prepared from 
plasmids pBHl, pBC 1 and pCAl 8 as shown in Fig. 6. The plasmid 
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pBHl was cloved with Hindi 1 1 and ligated to the fragment BC 
excised from plasmid pBCl by treatment witn lHndIII and BamHI. 
The resulting plasmid P BC135 was cleaved with EcoRI and ligated 
to an EcoRI fragment of pUcS which contains the lac control 
region and the majority of the 8 -g al actos i dase structural gene 
(designated I). K. It.kura et aK Science, 198. 1056 (1977). 
This ligation produced plasmid pIB254 which was cleaved with . 
BamHI. Sa1_I and alkaline phosphatase. The product of this 

j *,-,„mo«* ra PYrupH from olastnid pCA18 

cleavage was iiyauru cu » 

by treatment with BajnH I and San as shown in Fig. 6 and trans- 
formed into E.. colj. 294 and cloned. The plasmid pBCAS was 
recovered from ampicill in-resistant, tetracycl ine-sensitive 
clones in E. coli 294 grown on X-gal plates containing 
ampicill in and contained the DMA coding sequence for the 
proinsulin analog as indicated by the method of A.M. Haxam 

et al . , supra . 

. 3. Expression of Proinsulin Analog 

The fully characterized plasmid pBCA5 was inserted into 
E. coli RV308 and grown in four-liter flasks containing 1.5 
liter LB containing 20 mg/1 ampiclllin. Recovered cells (322 g 
wet) were lysed by sonication in two liters of 10 percent 
sucrose, SO mH EDTA. 0.1 H tris/HCl, P H 7.9, 0.1 N 
phenylmethylsulfonylfluoride. 0.2 M NaCl , and 1 mH 
1.3-d1thio-2-propanol. After centri fugatl on (30 min, 5000 rpra) 
the pellet was suspended by stirring In 400 ml 7H guanidine 
hydrochloride, 0.1 mH, 1 .3 -di t hi o-2 -propanol . 1 mH E0TA. This 
suspension was centrifuged (30 min, 12,000 rpm) and the 
supernatant diluted six-fold into cold water. The 
precipitation protein (12.4 g dry weight) was collected by 
centrifugation (20 min. 5000 rpm) and treated overnight at RT 
with 2.8 g (26.4 mmol ) CMBr in 200 ml 88 percent formic acid to 
cleave the proinsulin analog (hereinafter " ana 1 og- ' C ' " 



-31- 



0055945 



prolmulln) fro. the s-gal act os 1 dase residues at the 
H-ethlonine unit. After rotary evaporation to dryness at 
under 30'C, water was added and the vol u.e agafn reduced. The 
residue was suspended In 300 6 M guanine hydrochloride and 
the P H adjusted to 9.0 with ethano, „, ne . To this was added 
15.0 g sodium su 1f1te and 7.5 , sodi Ura tetra thlonate to convert 
crstelnes and cystines to cysteine S-sulfonate groups a „ d ^ 
■ixtur. flowed to react at roo. temperature for si, hours . 
The reacrion rn-t,re wa> exhaustively dlalyzed (Spectropor 3) 
against 1 raM £D TA at 4'C and the precipitated protein 
S-sulfonates collected by cen tr 1 f uga ti on. (20 .,„. 500 0 rpB ). 
The pellet was suspended in 50 cl 7.5 raM Tris/HCl. P H 7.5. 
filtered and loaded onto a DE-52 colu.n , 2 . s x 87 „, ,„ [ he 
sa.e buffer at 4'C. The colu ran „ as eluted with a linear 
_ S radient of 0 to O.S M HaCI in the sarae buffer . T „ e 

of .ini-C proinsulln S-sulfonate was detected b y , nsulin RIA 
analyses described below to elite at the end of the OD pea. 
The pooled fractions were applied to a G-50F Sephade* colu.n 
(2-5 x 100 „J and eluted with 7 M urea, SO „M trf,/HCl. pH 
7.5, 1 nM EDTA. Fractions containing insul in-RI A-act1 ve 
"t.rl.1 were pooled and dialyzed against 20 mH amn , oniura 
pH 8.8, and lyophiliied. The whl te powder was 
resuspended in 7 20 aH amraon1un carbonate. pH 8 8 and 
stored at .40«C. A portion of the product pool was further 
purified by preparative HP LC and the insulin RlA-act1ve pea, 
analyzed for a.ino-acid content which Is shown In Table 5 . 

The analog.-C" proi nsulln S-sulfonate was purifi ed by 
"PIC by prep-colleccion fro* an analytical C-8 column. The 
resolving gradient was 21-39 percent 2-propanol in 50 .„ HaOAc 
PH 7.0. at 0.6 percent/^ and 2 ml/mfn . Up to ^ ^ ^ 
Protein solution Kas resolved jn ^ by ^ 

injections vfa a 500 u l loop. 



-32- 



0055945 



TABLE 5 

Amino Acid Analysis of 24-hour 6N HC1 hydrolysis (110*) of 
purified analog-"C" proinsulin. Cysteine was quantitated by 
separate determination of cysteic acid on performlc acid 
oxidized sample and calculated by cy s te i c/al an i ne ratio. 
Yalues Increased to compensate for acid decomposition were 
serine {10 percent) and threonine (5 percent). N.D- = not 
de te rmi ne d . 



Amino Acid aa mol e percent x 57 

predicted 



Ala 


1 .19 


1 


Arg 


4 .06 


4 


Asp 


3.17 


3 


Cys/2 


4.87 


6 


Glu 


6 .95 


7 


Gly 


5.37 


5 


His 


2.03 


2 


Me 


1 .39 


2 


Leu 


5.98 


6 


Lys 


2.02 


2 


Met 


0 


0 


Phe 


3.03 


3 


Pro 


1.96 


1 


Ser 


4 .21 


4 


Thr 


3 .09 


3 


Trp 


N.O. 


0 


Tyr 


3 .99 


4 


Yal 


3 .74 


4 



-33- 



0055945 



4. Folding Of Analog "C" Proinsulin 

Analog -c- proinsulin folding t0 0Dtaln , CPOSslJnked 
fr. was accomplished by reaction „ 4 mg Qf prQtefn 
about 30 percent anal og - C - profnsuHn sulfonate. The protejn 
was dissolved in-a degassed buffer of 40 m « glycine. P H 10.S 3 
« -a, 0.3 „ „aCl at o-. To this wa S added ,- m ercaptoethano, 
to a concentration of 6.4 « and the reaction sealed under 
N 2 . The course of the reaction was followed by assuring the 
increase in m activity and was co„,plete within about four 
-ours. The reaction was stopped by the addition of acetic acid. 

The reaction mixture was purified by HPLC by 
prep-coUection fro. a C-lfl Ultrosphere colu.n. The resolving 
gradient was 21-28 percent acetonitrile in 0.2* a rara0 niu ra 

sulfate. 50 mM NaOAc, pH 4.0 at n 7 „ 

P «-0. at 0.2 percent/rain and 1.0 ml/nin. 

5. Assay For Analog "C" Proinsulin 

Analog - C - proinsulin has no crossreacti vi ty in the 
insulin radio.™ assay as the S-sulfonate. However the 
Nation of analog - C - proinsulin as an egression product was 
confined by crossreacti v, ty of the thiol fon . The samples of 
unknowns were treated for two nnnutes ,t 90'C wlth , mM 
e-ercaptoethanol. diluted into RIA buffer (oa „ sod{um 
P-.p-t.. pH 7.4, 0.1 H N ac,.0.1 percent Ka^o.l percent 
gelatin, and Mediately assayed. Reproduceab il i ty depends 
"Pen strict ti.ing since extended incubation of the diluted 
Educed test solution leads to variable oxidative folding of 
the ra ol ecu i e , nt0 forms wm hjgher m actlvUy 

The thiol forn, gave an activity of 0.9 percent Spared 
toins U l (n . sl0 o percent. By comparison, bovine proinsu, in had 
"A activity of 7.3 percent of insul in activity and the tMo. 
for, of bovine proinsulin had an activity of 1.* percent. The 
third form of the R rh>in 

C " aln 0f porc1ne Proinsulin had activity of 
0-1 percent. On incubation with a slight excess of 
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,-mercaptoethanol to obtain the folded. crosslink product, 
reaction mixtures show RIA activity of 20-40 percent that of 
1 nsul 1 n . 

C. Preparation of Human Insulin 

1. Folding and Linking of A and B Chains. 

The human proinsulin or analogs thereof, prepared in 
ccordance with the present invention, for example, following 
the procedures of Parts A or B above, as their respective 
S-sulfonates are Induced to fold with proper formation of 
internal disulfide bonds (between cysteine ? B and cysteine 

. n and cysteine™ A. and between. 

cysteine 6 n A) by means of controlled sulfhydroxyl 
interchange catalyzed by e-mercaptoethanol . 

To a 0.1 mg/ml solution of proinsulin S-sulfonate in 
degassed SO mM sodium glycinate. pH 10.6. at 4'C was added 
8-mercaptoethanol to a final concentration of 0.3 »M. After 
four hours, the reaction is essentially complete as measured by 
the increase in cross-reacting activity of the mixture in 
insulin RIA. The yield of proinsulin is about 80 percent. 
Proinsulin is then purified from side products by gel 
permeation, ion exchange, and/or reverse phase high pressure 
liquid chromatography to yield product In substantially 
purified form. 

2. Excision of Bridging Chain 

The human proinsulin or analogs thereof, prepared in 
accordance with the present invention, for example, following 
the procedure of Part C. 1. above, are proteolytically converted 
to Insulin for example in accordance with the procedure of 
Kemmler et al.. . J- Biol Chem. . 246. 6786 (1971 ). The obtained 
lAsulin is then purified by column chromatography or zinc 
crystallization to yield product in substantially 
form, identical to natural human insulin and freed of 
biologically active contaminants. 
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Vhlle the Invention In its most preferred embodiment is 
described with reference to £. col 1 , other mi cro rgan 1 sms could 
likewise serve as host cells, for example, yeasts such as 
SacchdromycgS cergYlSlae, Bacilli such as BaclTI us subtil Is and 
preferably other enterobacteriaceae among which may be 
mentioned as examples Salmonel 1 a typhjmurium and Serratia 
marcesan j, utilizing plasmids that can replicate and express 
heterologous gene sequences in these organisms. The 
i-nvention is not to be limited to the preferred embodiments 
described . 
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CLAIMS : 

1. A chimeric polypeptide comprising: 

a) the polypeptide sequence of a proinsulin comprising 
the A and B chains of human insulin connected by a bridging 
chain of at least 2 .amino acid units,- said bridging chain 
having sites at each end which permit its excision from 
between said A and B chains; and 

b) an additional protein or protein fragment; 
there being a cleavage site at or adjacent said additional 
protein or fragment and adjacent one end of the polypeptide 
sequence of said proinsulin. 

2. A chimeric polypeptide according to claim 1 wherein the 
amino acid sequence of the bridging chain corresponds to that 
of the C peptide of human proinsulin. 

3. A chimeric polypeptide according to claim 1 wherein the 
amino acid sequence of the bridging chain is Arg-Arg-Gly-Ser- 
Lys-Arg . 

4. A chimeric polypeptide according to claim 1 wherein the 
amino acid sequence of the bridging chain is Arg-Arg. 

5. A chimeric polypeptide according to claim 1 or claim 2 
wherein the sites permitting excision of the bridging chain 

* 

are the amino acid units Arg-Arg at the B chain end and Lys- 
Arg at the A chain end. 
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6. A chimeric polypeptide according to any one of claims 1 
to 5 wherein said cleavage site is a methionine unit. 

7. A chimeric polypeptide according to claim 6 wherein the 
methionine unit is adjacent the N-terminal of said proinsulin. 

3. A chimeric polypeptide according to any one of claims 1 
to 7 wherein the additional protein or fragment is either 

a) at least a substantial portion of 0-galactosidase; or 

b) a fragment of the trp leader polypeptide fused Lo a 
portion of the trp E polypeptide. 

9. A process of producing a chimeric polypeptide according 
to claim 1 comprising the steps:' 

1) inserting a gene coding for said proinsulin into a 
microbial cloning vehicle in which the gene is in reading 
Phase with a DNA sequence coding for said additional protein 
or fragment comprising said cleavage site; 

2) transforming said cloning vehicle containing said 
inserted gene into a microbial host for expression of said 
chimeric polypeptide; 

3) expressing the chimeric polypeptide; and 

4) isolating the expressed chimeric polypeptide. 

10. A process for producing a proinsulin comprising the 
process of claim 9 and the additional step of cleaving said 
chimeric polypeptide to release said proinsulin. 
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11. A process for producing a protein comprising the process 
of claim 10 and the additional step of excising said bridging 
chain from said proinsulin. 

12. A process according to claim 11 wherein the excision of 
the bridging chain is preceded by the formation of disulfide 
bonds between the A and B chains and the product of excision 
is human insulin. 

13. Human proinsulin when prepared by the process of claim 
10. 

14. Human insulin when prepared by the process of claim 12. 

15. A cloning vehicle suited for transformation of a 
microbial host and use therein for expressing a chimeric 
polypeptide according to claim 1. 

16. The plasmid pH17. 

17. The plasmid pBCA5. 

18. A viable culture of microbial transf ormants containing 
a cloning vehicle according to any one of claims 15 to 17. 

19. A method of producing human insulin comprising: 1) 
cultivating a culture according to claim 18; 2) separating the 
resulting cellular mass; 3) isolating the precursors to human 
insulin comprising the chimeric polypeptide according to claim 
1; 4) cleaving the additional protein therefrom; 5) effecting 
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folding and linkage of the A and B chains; and 6) excising the 
bridging chain. 

20. A product of microbial expression comprising the 
polypeptide sequence of a proinsulin comprising the A and B 
chains of human insulin connected by a bridging chain of at 
least 2 amino acid units and from which human insulin is 
derivable upon excision of said bridging chain. 
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